Declustering Large Multidimensional Data Sets for Range Queries over Heterogeneous Disks

نویسندگان

  • Jonghyun Lee
  • Marianne Winslett
  • Xiaosong Ma
  • Shengke Yu
چکیده

Declustering is a technique to distribute data sets over multiple disks so that future retrievals can be well balanced over the disks and be performed in parallel. Although disk heterogeneity often exists in systems like clusters, most work on declustering has focused only on homogeneous environments. In this work, we investigate the declustering problem for a heterogeneous disk environment using virtual servers, and propose novel approaches for deciding the number of virtual servers and the mapping between virtual servers and physical disks. Our experimental results show that by combining our algorithm for choosing the number of virtual servers with a greedy algorithm for mapping virtual servers to disks, users can expect range query retrieval performance within 4% of the optimum achievable in practice on average, in all configurations studied. Compared to the intuitively natural approach to the problem based on randomly assigning virtual servers to physical disks, this represents an improvement of 8–31% in average fetch ratio, as well as a 21–42% reduction in the standard deviation of retrieval performance for small queries.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalability Analysis of Declustering Methods for Multidimensional Range Queries

Efficient storage and retrieval of multiattribute data sets has become one of the essential requirements for many data-intensive applications. The Cartesian product file has been known as an effective multiattribute file structure for partial-match and best-match queries. Several heuristic methods have been developed to decluster Cartesian product files across multiple disks to obtain high perf...

متن کامل

Latin Hypercubes: A Class of Multidimensional Declustering Techniques

The I/O subsystem is widely accepted as one of the principal bottlenecks for high performance parallel databases systems. The emergence of parallel I/O architectures has made the problem of data declustering, i.e. fragmenting a le of records and allocating the pieces to different disks, one of prime importance. This is evident from the growing activity in this area. In this study we focus only ...

متن کامل

A Hierarchical Technique for Constructing Efficient Declustering Schemes for Range Queries

Multi-disk systems, coupled with declustering schemes, have been widely used in various applications to improve I/O performance by enabling parallel disk accesses. A declustering scheme determines how data blocks should be placed among multiple disks to maximize the parallelism. We focus on the problem of declustering grid-structured multidimensional data with the objective of reducing the resp...

متن کامل

Multidimensional Declustering Schemes Using Golden Ratio and Kronecker Sequences

We propose a new declustering scheme for allocating uniform multidimensional data among parallel disks. The scheme, aimed at reducing disk access time for range queries, is based on Golden Ratio Sequences for two dimensions and Kronecker Sequences for higher dimensions. Using exhaustive simulation, we show that, in two dimensions, the worst-case (additive) deviation of the scheme from the optim...

متن کامل

Efficient retrieval of multidimensional datasets through parallel I/O

Many scientific and engineering applications process large multidimensional datasets. An important access pattern for these applications is the retrieval of data corresponding to ranges of values in multiple dimensions. Performance is limited by disks largely due to high disk latencies. Tiling and distributing the data across multiple disks is an effective technique for improving performance th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003